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On Abstract. Our aim is to build a set of rules, such that reasoning over 

temporal dependencies within gene regulatory networks is possible. The 

h^- underlying transitions may be obtained by discretizing observed time se- 

ries, or they are generated based on existing knowledge, e.g. by Boolean 

^>; networks or their nondeterministic generalization. We use the mathemat- 

ical discipline of formal concept analysis (FCA), which has been applied 
successfully in domains as knowledge representation, data mining or soft- 
er ware engineering. By the attribute exploration algorithm, an expert or a 

supporting computer program is enabled to decide about the validity of 
a minimal set of implications and thus to construct a sound and com- 
plete knowledge base. From this all valid implications are derivable that 

r- H relate to the selected properties of a set of genes. We present results of 

^ our method for the initiation of sporulation in Bacillus subtilis. However 

r ' the formal structures are exhibited in a most general manner. Therefore 

the approach may be adapted to signal transduction or metabolic net- 
works, as well as to discrete temporal transitions in many biological and 
nonbiological areas. 
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1 Introduction 

> 

J^j As the mathematical methodology of formal concept analysis (FCA) is little 

known within systems biology, we give a short overview of its history and pur- 
& poses. During the early years 1980, FCA emerged within the community of set 

and order theorists, algebraists and discrete mathematicians. Its first aim was to 
find a new, concrete and meaningful approach to the understanding of complete 
lattices (ordered sets such that for every subset the suprcmum and the infimum 
exist). The following discovery showed to be very fruitful: Every complete lat- 
tice is representable as a hierarchy of concepts, which were conceived as sets of 
objects sharing a maximal set of attributes. This paved the way for using the 
developed field of lattice theory for a transparent and complete representation of 
very different types of knowledge. FCA was inspired by the pedagogue Hartmut 



von Hcntig [7] and his program of restructuring sciences, with a view to interdis- 
ciplinary collaboration and democratic control. The philosophical background 
goes back to Charles S. Peirce (1839 - 1914), who condensed some of his main 
ideas to the pragmatic maxim: 

Consider what effects, that might conceivably have practical bear- 
ings, we conceive the objects of our conception to have. Then, our 
conception of these effects is the whole of our conception of the 
object. [14, 5.402] 

In that tradition, FCA aims at unfolding the observable, elementary proper- 
ties defining the objects subsumed by scientific concepts. If applied to temporal 
transitions, effects of homogeneous classes of states can be modeled and pre- 
dicted in a clear and concise manner. Thus FCA seems to be appropriate to 
describe causality - and the limits of its understanding. 

At present, FCA is a richly developed mathematical theory, and there are 
practical applications in various fields as data and text mining, knowledge man- 
agement, semantic web, software engineering or economics [3]. FCA has been 
used for the analysis of gene expression data in [2] and [13], but this is the 
first approach of applying it to model (gene) regulatory networks. The math- 
ematical framework of FCA is very general and open, such that multifarious 
refinements are possible, according to current approaches of modeling dynamics 
within systems biology. On the other hand, we developed a formal structure for 
general discrete temporal transitions. They occur in a variety of domains: control 
of engineering processes, development of the values of variables or objects in a 
computer program, change of interactions in social networks, a piece of music, 
etc. 

In this paper, however, the examples are uniquely biological. The purpose is 
to construct a knowledge base for reasoning about temporal dependencies within 
gene regulatory or signal transduction networks, by the attribute exploration 
algorithm: For a given set of interesting properties, it builds a sound, complete 
and nonrcdundant knowledge base. This minimal set of rules has to be checked 
by an expert or a computer program, e.g. by comparison of knowledge based 
predictions with data. 

Since there exist relatively fixed thresholds of activation for many genes, it 
is a common abstraction to consider only two expression levels off and on. The 
classical approach of Boolean networks [8] is able to capture essential dynamic 
aspects of regulatory networks. Our present work is based on it, which also makes 
it possible to use standard mathematical and logical derivations for deciding 
many rules automatically and for a scaling up to larger networks. Nevertheless, 
the introduction of more fine-grained expression levels remains possible, e.g. 
in the sense of qualitative reasoning [11]. Further, this work is influenced by 
computation tree logic [1], automata theory and a FCA modeling of temporal 
transitions in [15]. Temporal concept analysis as developed by K.E. Wolff [3, 
p. 127-148] is more directed toward a structured visualization of experimental 
time series then toward temporal logic. We applied it to the analysis of gene 
expression data in [19]. 



In Section 2, the general mathematical framework will be developed. Section 
3 gives results for a B. subtilis Boolean network. In Section 4, we will discuss the 
potential of the method and make some proposals for improving it by solving 
mathematical problems which have emerged from the applications. 

2 Methods 

2.1 Fundamental Structures of Formal Concept Analysis 

One of the classical aims of FCA is the structured, compact but complete visual- 
ization of a data set by a conceptual hierarchy. We briefly introduce its basic def- 
initions; for an easy example see http://www.upriss.org.uk/fca/fca.html. 

Definition 1. A formal context (G,M,I) defines a relation I C G x M be- 
tween objects from a set G and attributes from a set M . The set of the attributes 
common to all objects in A C G is denoted by the ' -operator: 

A' := {to e M\ glm for all g g A}. 

The set of the objects sharing all attributes in B C M is 

B' := {g £ G\ glm, for all me B}. 

Definition 2. A formal concept of the context (G, M, I) is a pair (A, B) with 
A C G, B C M, A' = B and B' = A. A is the extent, B the intent of the 
concept (A,B). 

Thus a formal context (G, M, I) is a special, but universally applicable type 
of a data table, provided with two operators Cm ■ V(M) — ► V(M), B C M i— ► 
B" and C G : V{G) -> V(G), A C G i-> A". It is easy to see that they are 
closure operators, with the properties monotony, extension and idempotency [4, 
Definition 14]. It follows that the set of all extents resp. intents of a formal 
context is a closure system, i.e. it is closed under intersections [4, Theorem 1]. 

Formal concepts can be ordered by set inclusion of the extents or - dually, 
with the inverse order relation - of the intents. With this order, the set of all 
concepts of a given formal context is a complete lattice [4, Theorem 3] (Figure 

During the interactive attribute exploration algorithm [4, p. 85ff.], an expert 
is asked about the general validity of basic implications A — > B between the 
attributes of a given formal context (G, M, I) . An implication has the meaning: 
"If an object g £ G has all attributes a £ A C M, then it has also all attributes 
b £ B C M." If the expert denies, s/he must provide a counterexample, i.e. anew 
object of the context. If s/he accepts, the implication is added to the stem base of 
the - possibly enlarged - context. A theorem by Duqucnnc-Guigcs [4, Theorem 
8] ensures that every implication scmantically valid in the underlying formal 
context can be derived syntactically from this minimal set by the Armstrong 
rules [4, Proposition 21]. In many applications, one is merely interested in the 
implicational logic of a given formal context, and there is no need for an expert 
to confirm the implications. 



2.2 Constructing the Knowledge Base - Summary of the Method 

We start with two sets: 

— The universe E. The elements of E represent the entities of the world which 
we are interested in. 

— The set F (fluents) denotes changing properties of the entities. 

A state ip e G is an assignment of values in F to the variables e e E, hence it 
is defined by a specific choice of attributes to G M C Ex F. 3 By means of a state 
context (Definition 3, Table 3 left part), temporal data can be translated into 
the language of FCA. The dynamics is modeled by a binary relation R C G x G 
on the set of states, which gives rise to a transition context K (Definition 4, 
Table 1): the objects are transitions (elements of the relation) and the attributes 
the values of the entities defining the input and output state of a transition. 
This data table may reflect observations repeated at different time points, or 
the transitions may be generated by a dynamic model. As to the latter, we are 
focusing here on Boolean networks, i.e sets of Boolean functions for each entity 
(Definition 5). 

It is promising to consider the transitive closure of R. Objects of the tran- 
sitive context K t (Definition 6, Table 2) then are pairs of states such that the 
output state emerges from the input state by some transition sequence of arbi- 
trary length. Finally we extend the state context K s by the temporal attributes 
always(m), eventually (to) and never(m), which are determined by the given 
transitions (Definition 3, Table 3). 

The defined mathematical structures may be used in various ways. For in- 
stance, one could evaluate - i.e. generalize implications or reject them supposing 
outliers or by reason of special conditions - experimental time series by compar- 
ison with existing knowledge. Our general procedure is the following: 

1. Discretize a set of time series of gene expression measurements and transform 
it to an observed transition context K obs . 

2. For a set of interesting genes, translate interactions from biological literature 
and databases into a Boolean network. 

3. Construct the transition context K by a simulation starting from a set of 
states, e.g. the initial states of K o6s or all states (for small networks). 

4. Derive the respective transitive contexts K t and K° bs . 

5. Perform attribute exploration of K t . Decide about an implication A — > B, 
A,BC M, by checking its validity in K° 6s and/or by searching for supple- 
mentary knowledge. Possibly provide a counterexample from K° bs . 

6. Answer queries from the modified context K t and from its stem base. 

In step 5, automatic decision criteria could be thresholds of support q = \(A\JB)'\ 
and confidence p — ,.,,' for an implication in K.° bs . A weak criterion is to 



3 Thus - as usual - states with the same variable values are identified. It would also be 
possible to distinguish them as situations by introducing a new attribute, e.g. "time 
interval" . 



reject only implications with support (but if no object in K° 6s has all attributes 
from A, the implication is not violated). In [18], a strong criterion has been 
applied: implications of K t had to be valid also in the observed context (p = 1). 
This is equivalent to an exploration of the union of the two contexts. 

In Section 3 we will analyse pure knowledge based simulations; the validation 
by data and experimental literature had been done before in [5]. For that reason, 
in step 5. the stem base is computed automatically, without further confirmation 
by an expert. 

Step 2. could be supported by text mining software. Then attribute explo- 
ration provides strong criteria of validation. We implemented the steps 1., 3. 
and 4. in R [www.r-project.org]. For step 5., we used the Java tool Concept Ex- 
plorer [http://sourceforge.net/projects/conexp]. The output was translated with 
R into a PROLOG knowledge base. The R scripts are available on request. 

2.3 Definition of the Relevant Formal Contexts 

With given sets E and F, we define a state as a map ip : E — > F. To explore 
static features of states, the following formal context 4 is defined. 

Definition 3. Given two sets E (entities) and F (fluents), a state context is 

a formal context (G, M, I) with G C F E := {ip : E -> F} and M C E x F; its 
relation I is given as ip I (e, /) <=> p(e) = f, for all ip G G, e G E and f £ F. 

Definition 4. Given a state context (G, M, I) and a relation R C G x G, a 
transition context K is the context (R, M x {in, out}, I) with the property 

Vi G {in, out} : (p m , p out )I{e, f, i) & p\e) = f. (1) 

Transitions may be generated by a Boolean network: 

Definition 5. Let E be an arbritray set of entities, F := {0,1} (fluents), and 
states G C F E . Then a transition function F E — * F E is called a Boolean 
network. 

We will identify the elements of F with — , + or off, on respectively. This defini- 
tion is subsumed by the definition of a dynamic network in [12, Definition p. 34], 
with a set of variables E and state sets X\ = ... = X n = F. We use a parallel 
update schedule, i.e. the order relation on E is empty. Boolean networks may 
be generalized in order to include nondcterminism; then different output states 
tp out are generated from a single input state <p m (see Section 3, compare [18]). 

Definition 6. A transition context K with a transitively closed relation t(R) C 
G x G is called a transitive context K t . 
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It is equivalent to a many-valued context with nominal scale [4, Section 1.3], [18, 
p. 123]. For better readability, we draw the contexts in the latter form (Table 3). 
Deriving a one- valued context according to Definition 3 is obvious: each many-valued 
attribute e is replaced by {(e, /)| / € F}), e.g. SigA by SigA.off and SigA.on. If an 
attribute e takes exactly one of these values, negation of on and off is expressed. 
Other kinds of scaling like the interordinal scale could be interesting, if there are 
more than two levels (|F| > 2). 



Table 1. A transition context for the states of Table 3, with all attributes that are 
changing during the small simulation, as well as SpoOA and SpoOAP. 
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Table 2. The transitive context derived from the transition context of Table 1. 
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Definition 7. 4 state context K = (G, M, /) is extended to a formal context 
K s = (G, MUT,/U It) by a set of temporal attributes T := {always(m)\ m G 
M} U {newer (to) | to G M} U {ei>en£ualZy(m)| to G M}. Let (p ln E G,m — 
(e, f), e € E, f € F, and t(R) C G x G a transitively closed relation. The 
relation It o/K s i/ien is defined as follows: 

ip in I T always(m) <&\/(cp in ,cp out ) G t(R) : <p out {e) = f 

<p m I T never(m) ^>M{ip m ,ip out ) G t(R) : <p out (e) ^ f 

tp m I T eventually(m) «■ 3(<p in , Lp out ) G t(iJ) : <^ ou *(e) = / 

For B C T, set always(B) := {{always(bi), ..., always(bi)} \ bi, ..., bi € B}, and 
analogously never (B) and eventually(B). 

The attributes will be abbreviated to alw(m), nev(TO) and ev(m). In a nondeter- 
ministic setting, alw(m) and nev(m) refer to all possible transition paths starting 
from ip m , ev(m) to the existence of a path. 



2.4 Dependency of Contexts and Background Knowledge 

In the following we will present first mathematical results that can improve 
comput ability; they are not necessary for the understanding of the application 
in Section 3. By entering background knowledge (not necessarily implications) 
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Table 3. Left part: A state context corresponding to a simulation starting from a B. 
subtilis state without nutritional stress (see Section 3.1, [16, Table 4]). +: on, -: off. 
Right part: extension by temporal attributes. Here they are the same for all states, 
since these reach the attractor (limit state cycle) {fi, ^2} after at most one time step. 
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prior to an attribute exploration, the algorithm may be shortened considerably 
[3, p. 101-113]. We searched for first order logic background formula in order to 
use the results of an attribute exploration for the exploration of the next context 
in the hierarchy. Then the implications of the latter context are derivable from 
this background knowledge and a reduced set of new implications. Also during 
the exploration of one context, implications can be decided automatically based 
on already accepted implications. In this way the expert is enabled to concentrate 
on really interesting hypotheses. Thus, the implications of a state context hold 
in the input and output part of the corresponding transition context (for an 
example see [15, p. 149f.]). Related to transitive and extended state contexts, 
the subsequent result holds: 

Proposition 1. Let K s = (G, MUT,/U It) on extended state context. Sup- 
pose the relation t(R) C G x G is the object set of the transitive context K t = 
(t(R),M x {in, out}, I). Then the following entailments between implications of 
both contexts are valid: 



B m - 


->m out 


in 


B m - 


->m out 


in 


QOUt 


^m out 


in 


QOUt 


^m out 


in 



B m U< 



_L in . 



i t = B — > always{m) in K s 

l.t |= B — * eventually {m) in K s 

L t \= always (B) — > always (m) in K s 

it \= eventually (B ) — > eventually (m) in\ 

\t |= B — * never(m) in K s 



(2) 
(3) 
(4) 
(5) 
(6) 



Lf the latter implication does not follow from the stem base o/K i; this is equivalent 
to B — > eventually(m) in K s . 



Proof. The proofs are straightforward from the definitions. 



□ 



In order to get a complete overview on valid entailments, as a first step we 
performed rule exploration [20] of the following test context, i.e. exploration 
of Horn rules instead of implications, thus variables are admitted: The objects 



are all possible K t respectively the corresponding K s , and the attributes are 
the following classes of implications with "homogeneous" premises. Then the 
explored rules for implications correspond to entailments valid for the semantics 
given by the objects, the transitive contexts. The sets are nonempty subsets of 
M,m= (e, /), / G F := {0, 1}, and m G B , C (Bud) =*■ <p(e) = (1). We 
suppose that all states and transitions are completely defined. 



1. 


B in _^ 


C in 




12. 


TJOUt , f^OUt 


2. 


B in _^ 


r^out _ r>in _ 


-> nev(Ci) 


13. 


TJOUt . S~iOUt 


3. 


B in _^ 


s~iout — r>in _ 


-> alw(Ci) 


14. 


ev(Bi) - 


* ev(d) 


4. 


B in _^ 


■ ev(Ci) 




15. 


ev(Bi) - 


- alw(Ci) 


•5. 


TDOUt 


^C m 




1G. 


ev(Bi) - 


- nev(Ci) 


6. 


TDOUt _ 


->C m 




17. 


alw(Bi) 


-» ev(Ci) 


7. 


ev(Bi] 


i -> C m 




18. 


alw(Bi) 


-> alw(Ci) 


8. 


alw(Bi) -> C in 




19. 


alw(Bi) 


-» ncv(Ci) 


9. 


nev(B 


i) -» c m 




20. 


nev(Bi) 


-> ev(Ci) 


10. 


r>out 

B o - 


. /~iout 




21. 


nev(Bi) 


-» alw(Ci) 


11. 


TDOUt 

B o ~ 


, r^out 




22. 


nev(Bi) 


-» ncv(Ci) 



The equivalences in 2. and 3. follow from Proposition 1(2). Since the impli- 
cations comprising input attributes arc independent from those related only to 
output attributes, rule exploration was performed (almost) independently for 
the first 9 and the remaining 13 implications. Results for the second part are 
shown here. 

The exploration started from a hypothetical context as single object of the 
test context, where no implications were valid. Before, we had added 25 known 
entailments as background rules (BR), like those of Proposition 1 or following 
from the definitions, like alw(Bi) — > ev(Bi). A counterexample represents a sig- 
nificant class of contexts. They had to be chosen carefully, since an object not 
having its maximal attribute set might preclude a valid entailment. 5 The explo- 
ration resulted in the following stem base of only 14 entailments. Most of them 
are background rules (they are accepted automatically during the exploration), 
but not all of these are needed in order to derive all valid entailments between 
the chosen implications. This demonstrates the effectivity and minimality of the 
algorithm. Entailments 5., 6., 7. and 10. were newly found. 



1. 

2. 
3. 

4. 
5. 


nev(Bi) - 
nev(.Bi) - 
alw(Bi) - 
alw(Bi) - 
ev(Bi) - 

. fiout 

— ► o 
ev(Bi) - 
ev(Bi) - 


-» alw(Ci) \= nev(Bi) 
- ev(C 1 ), ncv(B 1 ) - 
-► alw(Ci) f= alw(.Bi) 
-> ev(Ci), alw(Bi) -> 
► nev(Ci), nev(Bi) — * 


-► cv(Ci) (BR 1) 
nev(Ci) h ± (BR 11) 
-» cv(Ci) (BR 2) 
nev(Ci) f= _L (BR 14) 
nev(Ci) |= B in -» Cft"*, 5£ u * 


G. 

7. 


► nev(Ci), alw(Bi) — > 

► nev(Ci), alw(Bi) -> 


nev(Ci) h #i ut -» Cft"' 
ev(Ci) (= J- 



! B l 



Thus the attribute set of a counterexample must be a concept intent in the final test 
context. 



8. ev(Bi) -» alw(Ci) \= ev(Bi) -» ev(Ci), alw(Bi) -» ev(Ci), alw(Bi) -> 
alw(Ci) (BR 3) 

9. ev(Bi) -* ev(Ci) h alw(Bi) -» ev(Ci) (BR 4) 

10. ev(Bi) -» ev(Ci), nev(-Bi) -► ev(Ci) [= 5*™ -► Cf"*, Bg"* -> C?"*, Bf ut -> 
Cf**, ev(Bi) -» alw(Ci) 

11. S° M * -» Cf"* |= ev(.Bi) -> cv(Ci), alw(Bi) -» ev(Ci), alw(Bi) -► alw(Ci) 
(BR 4, BR 5 4= Proposition 1(4) (5)) 

12. Bf ut -» C$ ut \= alw(Bi) -» ncv(Ci) (BR 9 <= Proposition 1(4)) 

13. 5g"* -» Cf"* h nev(Bi) -> ev(Ci), nev(Bi) -> alw(Ci) (BR 1, 10 <S= Propo- 
sition 1(4)) 

14. Bg"* -> Cg ut h ncv(Bi) -^ ncv(Ci) (BR 6 4= Proposition 1(4)) 

It remains to prove the rules of this stem base, which is easy; we are giving 
some hints: BR 1, 2, 3, and 4 are based on alw(A) — ► ev(A), A C M, and BR 11 
and 14 on nev(Ci), ev(Ci) — * _L (_L = set of all attributes, and the corresponding 
object set is empty). 

7.: Since nev(Ci) and ev(Ci) do not occur together by definition, the combi- 
nation of the two implications has support in the test context. In the underlying 
contexts, the premise alw(Bi) (a subcase of ev(Bi)) is no attribute of any state. 
The implication alw(Bi) — > _L holds, which has not been considered explicitly. 

10.: Inversely, in all possible cases the states / transitions have the attribute 
ev(Ci), and therefore also alw(Ci) and C° ut . Explicitly: T — > ev(Ci), T — > 
alw(Ci), T — > C° ut . 5. is a parallel rule concerning nev(Ci). 

Rules 7. and 10. suggest that implications with empty premise T or con- 
clusion _L should be considered explicitly. If the counterexamples have maximal 
attribute sets, as a conclusion it can be stated that we have derived a set of rules 
representing a minimal, sound and complete entailment calculus for the selected 
classes of implications for transition and state contexts. 

3 Results: Sporulation in Bacillus subtilis 

In order to demonstrate the characteristics of the proposed method, we will apply 
it to a gene regulatory network assembled in [5] and transformed to a Petri net 
as well as a Boolean network in [16]. 

B. subtilis is a gram positive soil bacterium. Under extreme environmental 
stress, it produces a single endospore, which can survive ultraviolet or gamma 
radiation, acid, hours of boiling or long periods of starvation. The bacterium 
leaves the vegetative growth phase in favour of a dramatically changed and 
highly energy consuming behaviour, and it dies at the end of the sporulation 
process. This corresponds to a switch between two clearly distinguished genetic 
programs, which are complex but comparatively well understood. 

By literature and database search, de Jong et al. [5] identified 12 main regula- 
tors, constructed a model of piecewise linear differential equations and obtained 
realistic simulation results. An exogenous signal (starvation) triggers the phos- 
phorylation of the transcription factor SpoOA to SpoOAP by the kinase KinA; 
this process is reversible by the phosphatase SpoOE. SpoOAP is necessary to 
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transcribe SigF, which regulates genes initiating speculation and therefore is an 
output of the model. The interplay with other transcription factors AbrB, Hpr, 
SigA, SigF, SigH and SinR is graphically represented in [5, Figure 3]; SinI in- 
activates SinR by binding to it. SigA and Signal are considered as an input to 
the model and are always on. Table 4 lists the Boolean equations building the 
model in [16] (communicated by the author). They exhibit a certain degree of 
nondetcrminism, since the functions for the off fluents sometimes are not the 
negation of the on functions. This accounts for incomplete or inconsistent knowl- 
edge. In the case of state transitions determined by k conflicting function pairs, 
we generated 2 fc output states. 



Table 4. Boolean rules for the nutritional stress response regulatory network, derived 
in [16] from [5]. x=^x, x + y=x V y, xy=x A y. 



AbrB 


= SigA AbrB SpoOAP 


AbrB 


= SigA + AbrB + SpoOAP 


SigF 


= (SigH SpoOAP SinR) + (SigH SpoOAP SinI) 


SigF 


= (SinR SinI) + SigH + SpoOAP 


KinA 


= SigH SpoOAP 


KinA 


= SigH + SpoOAP 


SpoOA 


= (SigH SpoOAP) + (SigA SpoOAP) 


SpoOA 


= (SigA SinR SinI) + (SigH SigA ) + SpoOAP 


SpoOAP 


= Signal SpoOA SpoOE KinA 


SpoOAP 


= Signal + SpoOA + SpoOE + KinA 


SpoOE 


= SigA AbrB 


SpoOE 


= SigA + AbrB 


SigH 


= SigA AbrB 


SigH 


= SigA + AbrB 


Hpr 


= SigA AbrB SpoOAP 


Hpr 


= SigA + AbrB + SpoOAP 


SinR 


= (SigA AbrB Hpr SinR SinI SpoOAP) + 




(SigA AbrB Hpr SinR SinI SpoOAP) 


SinR 


= SigA + AbrB + Hpr + (SinR SinI) + (SinR SinI) + SpoOAP) 


SinI 


= SinR 


SinI 


= SinR 


SigA 


= TRUE (input to the model) 


Signal 


= TRUE or FALSE (constant, depending on the input state) 



3.1 Simulation Starting from a State Typical for the Vegetative 
Growth Phase 

We performed supplementary analyses of the transitions starting from a typical 
state without the starvation signal [16, Table 4]. The concept lattice for the 
resulting transitive context (Table 2, with a part of the attribute set only) is 
shown in Figure 1. The larger circles at the bottom represent object concepts; 
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their extents (highlighted part) are the four single transitions with the input 
state at t = or t = 2, and the intents are all attributes above a concept. 
Thus for instance, the two latter transitions have the attribute KinA.in.on in 
common, designating the respective concept. Moreover, they are distinguished 
unambigously from other sets of transitions by this attribute - the concept is 
generated by "KinA.in.on". 



^sm 



Signal. in. off 



AbrB.out.off 



KinA.out.on 



SpoOE.out.off 



SigH.out.off 



AbrB.out.on 






/ 


KinA.out.off 




/ * 


' SpoOE.out.on 



Hpr.out.on 



SigH.out.on 




uui.ui i r 4 (» SigF.in.on 



Fig. 1. Concept lattice (computed and drawn with Concept Explorer) representing 
a simulation without nutritional stress. Signal: starvation; AbrB, Hpr, SigA, SigF, 
SigH, SinR, SpoOA (phosporylated form SpoOAP): transcription factors; KmA: ki- 
nase; SpoOE: phosphatase; SinI inactivates SinR by binding to it. i-j indicates a 
transition (ip] n ,ip° ut ). Bold / blue lines: Filter (superconcepts) and ideal (subcon- 
cepts) of the concept ({fe>«», pf"), W\ ¥>!"*), (4", ¥>!"*), (<P?, V?*)}, {AbrB.in.off, 
SigH. in. off, SpoOE. in. off, Hpr. in. on}) 



Implications of the stem base can be read from the lattice. For instance there 
are implications between the generators of a concept: 



< 4 > AbrB.in.off — > SigH. in. off, SpoOE. in. off, Hpr. in. on 



(7) 



Analogous implications hold for the attributes of the conclusion, and there are 
implications between attributes of sub- and superconcepts. < 4 > indicates that 
the rule is supported by four transitions. 

The bottom concept has an empty extent. Its intent is the set of attributes 
never occuring during this small simulation. The top concept does not have an 
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empty intent - as it is often the case -, but it consists of 10 attributes common 
to all 6 transitions. The corresponding rule has an empty body (T): 

< 6 > T — > Signal. in. off, SigA.in.on, SigF.in.off, SpoOA.in.on, SpoOAP.in.off, 
SinR.in.off, Sinl.in.off, Signal. out. off, SigA.out.on, SigF.out.off, 
SpoOA.out.on, SpoOAP.out.off SinR.out.off Sinl.out.off 

(8) 
Related to the simulation in the presence of nutritional stress, the transitive 
context has about 20 transitions, 500 concepts and 50 implications. In a such 
case it is more convenient to query the implicational knowledge base. But also 
for the visualization of large concept hierarchies, there exist more sophisticated 
tools like the ToscanaJ suite [http://sourceforge.net/projects/toscanaj/]. 

3.2 Analysis of All Possible Transitions 

In order to analyse the dynamics of the B. subtilis network exhaustively, we 
generated 4224 transitions from all possible 2 12 = 4096 initial states (thus the 
rules are nearly deterministic). There were 11.700 transitions in the transitive 
context, from which we computed the stem base containing 524 implications 
with support > 0, but 11.023.494 w 2 24 concepts. 

It was not feasible to provide biological evidence for a larger part of the im- 
plications, within the scope of this methodological study. This could be done by 
literature search, especially automatic text mining, by new specialized experi- 
ments, or - in a faster, but less reliable way - by comparison with high-throughput 
observed time series [18, 3.2]. Instead we will give examples for classes of impli- 
cations that can be validated or falsified during attribute exploration in specific 
ways. We start with the examples of [16, 4.3]. 

— "For example, we know that in the absence of nutritional stress, sporulation 
should never be initiated [5]. We can use model checking to show this holds 
in our model by proving that no reachable state exists with SigF = 1 starting 
from any initial state in which Signal = 0, SigF — and SpoOAP = 0." [16, 
341] This is equivalent to the rule following from the stem base: 

Signal.in.off, SigF.in.off, SpoOAP.in.off -► SigF.out.off, (9) 

— SigF. out. on — ► Kin A. out. off, SpoOA.out.off, Hpr.out.off, AbrB.out.off: 
SpoOAP is reported to activate the production of SigF but also repress its 
own production (mutual exclusion). [5] 

— SigH.out.off — ► AbrB.out.off, SpoOE.out.off, SinR.out.off, Sinl.out.off 
All these genes are regulated gene. out — SigA.in + AbrB.in (+ ...). 



In our approach, such dependencies and mutual exclusions can be checked 
systematically. We searched the stem base for further interesting and simple 
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implications: 

< 4500 > SpoOAP.in.on, KinA.out.off -> Hpr.out.off (10) 

< 4212 > SigH.in.on. KinA.out.off -> Hpr.out.off (11) 

< 3972 > AbrB.in.off, KinA.out.off -► Hpr.out.off (12) 

Hpr and KinA are determined by different Boolean functions, but they are coreg- 
ulatcd in all states emerging from any input state characterized by the single 
attributes SpoOAP.on, SigH.on or AbrB.on. 

< 3904 > AbrB.out.on — ► SigA.in.on, SigA.out.on, SigF.out.off, 

SpoOA.out.on, SpoOE.out.on, SigH.out.on, (13) 
Hpr.out.off, SinR.out.off, Sinl.out.off 

AbrB is an important "marker" for the regulation of many genes, which is un- 
derstandable from the Boolean rules with hindsight. By a PubMed query, a 
confirmation was found for downregulation of SigF together with upregulation 
of AbrB [17]. 

Finally we entered sets of interesting attributes as facts into the PROLOG 
knowledge base, such that a derived implication was computed 6 . Complemen- 
tary to (9), we searched after conditions for the switch towards sporulation 
(SigF. out. on) and found the implication 

SigF.in.off, SpoOAP.in.off, SigF.out.on 
— > Signal. in. on. Signal.out.on, SigA.in.on, SigA.out.on, SpoOAP.out.off, (14) 
SpoOA.out.off, AbrB.out.off, KinA.out.off, Hpr.out.off. 

The latter four attributes follow immediately from the Boolean rules, but 
SpoOAP.out.off depends in a more complex manner on the premises. It is also 
noteworthy that the class of input states developing to a state with attribute 
SigF.out.on is only characterized by the common attributes Signal. in. on and 
SigA.in.on, i.e. the initial presence or absence of no other gene is necessary for 
the initiation of sporulation 7 

4 Discussion 

The present work translates observations and simulations of discrete temporal 
transitions into the language of formal concept analysis. The application to a 
well studied gene regulatory network showed how a model can be validated in 
a systematic way, by drawing clear and complete consequences from the theory 
(the knowledge based network), and we found interesting new transition rules. 



6 and accordingly the closure of the attribute set 

7 For this complete simulation, the conditions Signal = SigA = TRUE had been 
dropped, but they were supposed to be constant. 
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The approach could be expanded by accounting for the change of the network 
structure itself in strongly different biological situations, e.g. with or without 
stress. Thus in ongoing work we adapt a literature based network to observed 
transcriptomc time scries, resulting in two sets of Boolean functions related to 
the stimulation of human fibroblast cells by the cytokines Tnfa or Tgf/3. 

Until now we have established the foundation in order to exploit manyfold 
mathematical results of FCA for the analysis of gene expression dynamics and 
of discrete temporal transitions in general. An important question is: How can 
attribute exploration be split into partial problems, in this special case? For 
instance, one could focus on a specific set of genes first, which is understand- 
able as a scaling [4, Definition 28]. Then the decomposition theory of concept 
lattices will be useful, which permits an elegant description by means of the 
corresponding formal contexts. [4, Chapter 4] 

The price of the logical completeness is its computational complexity In 
this regard the status of attribute exploration has not yet fully been clarified. 
Computation time strongly depends on the logical structure of the context, and 
there exist cases where the size of the stem base is exponential in the size of 
the input [10]. However, deriving an implication from the stem base is possible 
in linear time, related to the size of the base, and the PROLOG queries in Sec- 
tion 3.2 were very fast. As demonstrated in Section 2.4, attribute exploration 
can be shortened by background knowledge. Further it will be crucial to decide 
implications without the necessity to generate all possible transitions. For that 
purpose, model checking [6] could be a promising approach, or the structural and 
functional analysis of Boolean networks by an adaptation of metabolic network 
methods in [9]. There, determining activators or inhibitors corresponds to the 
kind of rules found by our method, and logical steady state analysis indicates 
which species can be produced from the input set and which not. An exciting 
direction of research would be to conclude dynamical properties of Boolean net- 
works from their structure and the transition functions, e.g. by regarding them as 
polynomial dynamical systems over finite fields [12, Section 4] and by exploiting 
theoretical work in the context of cellular automata [12, Section 6]. 

The present work is a first step to use the potential of formal concept analysis 
for solving questions within systems biology. As indicated, many directions of 
research are possible. We encourage their investigation and are open to any 
collaboration with mathematicians, computer scientists or (systems) biologists. 

Acknowledgement. The work was supported by the German Federal Ministry 
of Education and Research BMBF (FKZ 0313652A). 
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